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Abstract 

We study large deviation properties of systems of weakly interact- 
ing particles modeled by ltd stochastic differential equations (SDEs). 
It is known under certain conditions that the corresponding sequence 
of empirical measures converges, as the number of particles tends to 
infinity, to the weak solution of an associated McKean-Vlasov equa- 
tion. We derive a large deviation principle via the weak convergence 
approach. The proof, which avoids discretization arguments, is based 
on a representation theorem, weak convergence and ideas from stochas- 
tic optimal control. The method works under rather mild assumptions 
and also for models described by SDEs not of diffusion type. To illus- 
trate this, we treat the case of SDEs with delay. 
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1 Introduction 

Collections of weakly interacting random processes have long been of interest 
in statistical physics, and more recently have appeared in problems of engi- 
neering and operations research. A simple but important example of such 
a collection is a group of "particles," each of which evolves according to the 
solution of an ltd type stochastic differential equation (SDE). All particles 
have the same functional form for the drift and diffusion coefficients. The 
coefficients of particle i are, as usual, allowed to depend on the current state 
of particle i, but also depend on the current empirical distribution of all 
particle locations. When the number of particles is large the contribution 
of any given particle to the empirical distribution is small, and in this sense 
the interaction between any two particles is considered "weak." 

For various reasons, including model simplification and approximation, 
one may consider a functional law of large numbers (LLN) limit as the num- 
ber of particles tends to infinity. The limit behavior of a single particle 
(under assumptions which guarantee that all particles are in some sense ex- 
changeable) can be described by a two component Markov process. One 
component corresponds to the state of a typical particle, while the second 
corresponds to the limit of the empirical measures. Again using that all 
particles are exchangeable, under appropriate conditions one can show that 
the second component coincides with the distribution of the particle compo- 
nent. The limit process, which typically has an infinite dimensional state, is 
sometimes referred to as a "nonlinear diffusion." Because the particle's own 
distribution appears in the state dynamics, the partial differential equations 
that characterize expected values and densities associated with this process 
are nonlinear, and hence the terminology. 

In this paper we consider the large deviation properties of the particle 
system as the number of particles tends to infinity. Thus the deviations we 
study are those of the empirical measure of the prelimit process from the dis- 
tribution of the nonlinear diffusion. Of particular interest, and a subject for 
further study, are deviations when the initial distribution of the single par- 
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tide in the nonlinear diffusion is invariant under the joint particle/measure 
dynamics, and related questions of stability for both the limit and prelimit 
processes. 

One of the basic references for large deviation results for weakly inter- 
acting diffusions is 10J. This paper considers a system of uniformly nonde- 
generate diffusions with interaction in the drift term and establishes a large 
deviation principle for the empirical measure using discretization arguments 
and careful exponential probability estimates (see Section 17. ip . Properties 
related to a large deviation principle such as fluctuation theorems have been 
studied in 33,[l,26,|3,21|. A proof of the large deviation principle for sys- 



tems with constant diffusion coefficient that is based on a comparison result 
for a related infinite dimensional Hamilton- Jacobi-Bellman equation appears 
in 



171 : Section 13.3]. 



Later works have developed the theory for a variety of alternative mod- 
els, including multilevel larg e deviations [111 . Il3| . jump diffusions 25, 24], 



discrete-time systems 0, Il2l |. and interacting diffusions with random inter- 
action coefficients j^J or singular interaction [18]. In the current work we 
develop an approach which is very different from the one taken in any of 
these papers. Our proofs do not involve any time or space discretization of 
the system and no exponential probability estimates are invoked. The main 
ingredients in the proof are weak convergence methods for functional occu- 
pation measures and certain variational representation formulas. Our proofs 
cover models with degenerate noise and allow for interaction in both drift 
and diffusion terms. In fact the techniques are applicable to a wide range of 
model settings and an example of stochastic delay equations is considered in 
Section [7] to illustrate the possibilities. 

The starting point of our analysis is a variational representation for mo- 
ments of nonnegative functionals of a Brownian motion Using this rep- 
resentation, the proof of the large deviation principle reduces to the study of 
asymptotic properties of certain controlled versions of the original process. 
The key step in the proof is to characterize the weak limits of the control 
and controlled process as the large deviation parameter tends to its limit 
and under the same scaling that applies to the original process. More pre- 
cisely, one needs to characterize the limit of the empirical measure of a large 
collection of controlled and weakly interacting processes. In the absence of 
control this characterization problem reduces to an LLN analysis of the orig- 
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inal particle system, which has been studied extensively [271 . Il9l . |20|. Our 
main tools for the study of the controlled analogue are functional occupation 
measure methods. Indeed, these methods have been found to be quite useful 
for the study of averaging problems, but where the average is with respect 
to a time variable [23]. In the problem studied here the measure- valued pro- 
cesses of interest are obtained using averaging over particles rather than the 
time variable. 

The approach presented here can be applied to interacting systems driven 
by general continuous time processes with jumps provided the systems are 
scaled in the right way. Indeed, the driving noise process could be a Brownian 
motion plus an independent Poisson random measure. A key step to make 
the approach work is a variational representation of Poisson functionals, 
which has recently been established in |8j. 

Finally, we remark that variational representations for Brownian motions 
and Poisson random measures have proved to be useful for the study 

of small noise large deviation problems and many recent papers have applied 
these results to a variety of infinite dimensional small noise systems. A small 
selection is HQ, [30I . |3l|-see 0| for a more complete list. We expect the 
current work to be similarly a starting point for the study, using variational 
representations, of a rather different collection of large deviation problems, 
namely asymptotics of a large number of interacting particles. 

An outline of the paper is as follows. In Section [2] we introduce the in- 
teracting SDE particle model, the related controlled and LLN limit versions, 
and discuss the relevant topologies and sense of uniqueness of solutions. Sec- 
tion [3] discusses the relation between Laplace and large deviation principles, 
states assumptions and the main result of the paper, and then outlines how 
this result will be proved using a representation theorem. In Section H] we 
describe the martingale problems that will be used in the proof. The proof 
itself is divided into lower and upper bounds in Sections [5] and [6j respectively. 
The constructions in the proof are set up to handle a more general case than 
just the model introduced in Section [2j and in Section [7] we use this gener- 
ality to state and prove a large deviation theorem for systems with delay. 
This section also reviews the prior work of (10|. The appendix contains the 
proof of a technical point that was deferred for reasons of exposition. 
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2 The model 



For each iV £ N, the iV-particle prelimit model is described in terms of a 
system of N weakly coupled d-dimensional stochastic differential equations 
(SDEs). The system is considered over the fixed finite interval [0, T}. Set 
X = C([0, T], M. d ) and equip X with the maximum norm, which is denoted by 
||.||. Similarly, set W= C([0, T], ]R rfl ) and equip W with the maximum norm. 
Let (£1, J-, P) be a probability space and suppose that on this space there is 
a filtration (J-f) satisfying the usual conditions (i.e., (Tt) is right-continuous 
and Fq contains all P- negligible sets), as well as a collection \ W l ,i 6 N} of 
independent standard c?i-dimensional (.Ft)- Wiener processes. 

Let b and a be Borel measurable functions defined on R d x V(R d ) taking 
values in M. d and the space of real d x d\ -matrices, respectively. If (S,ds) 
is a metric space, then V{S) denotes the space of probability measures on 
the Borel a- field B{S). The space V{S) is equipped with the topology of 
weak convergence, which can be metricized, using for example the bounded 
Lipschitz metric, making it a Polish space. 

The evolution of the state of the particles in the iV-particle model is given 
by the solution to the system of SDEs 



dX*' N (t) = b(X i ' N (t), f i N (t))dt + a(X i ^ N (t), f i N (t))dW i (t), X { ' N (0) = x*' N , 
where x^ N £ R d , i £ {1, . . . , N}, and 



is the empirical measure of (X 1, (t), . . . ,X ' (t)) for t £ [0, T]. By con- 
struction, H N (t) is a V (M. d )-v&lued random variable. Denote by jjl n the 
empirical measure of (X l,N , . . . , X N,N ) over the time interval [0, T], that is, 
fi N is the V(X)-vdlued random variable defined by 



Clearly, the distribution of fi N (t) is identical to the marginal distribution of 
fi N at time t, i.e., [i N {t) = fi N o n^ 1 where nt : X — > M d is the projection 
map corresponding to the value at time t. 



(2.1) 




i=l 




i=l 
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Our aim is to establish a Laplace principle for the family {fi N , N G N} of 
T^Af)- valued random variables. When i 5Zi=o ^x i ' JV converges weakly to fo 
for some i^o £ ^(R. ), the asymptotic behavior of fj, as A" tends to infinity 
can be characterized in terms of solutions to the nonlinear diffusion 
(2.2) 

dX(t) = b (X (t), Law (X(t)))dt + a (X(t), Law (X(t)))dW(t), X(0) ~ 

where If is a standard d\ -dimensional Wiener process. Thus we are inter- 
ested in the study of deviations of /j, n , N large, from its typical behavior, 
namely the probability law of the process solving Eq. (|2.2p . 

In the formulation and proof of the Laplace principle, we will need to 
consider a controlled version of Eq. (|2.ip . For N E N, let IAn be the space of 
all (J^)-progressively measurable functions u: [0,T] x — > M. Nxdl such that 



E 



T 

\u(t)\ 2 dt 







< oo, 



where E denotes expectation with respect to P and |.| denotes the Euclidean 
norm of appropriate dimension. For u S Un, we sometimes write u = 
(ui, . . . , itjy)) where Ui is the i-th block of d\ components of u. 

Given u £ Un, u = (ui, . . . ,un), we consider the controlled system of 
SDEs 

dX^it) = b(X i ' N {t),fi N {t))dt +a(X i ' N {t),fi N {t))ui(t)dt 
U> ' 0) +a(X i > N (t),fl N (t))dW i (t), X i ' N {0)=x i ' N , 

where p, N \t) and p, N are the empirical measures of X l ' N (t) and X t,N , re- 
spectively: 

1 N x N 

i=l i=l 

The "barred" symbols in the display above and in Eq. H 2 . 3 [) refer to objects 
depending on a control, here u. We adopt this as a convention and indicate 
control-dependent objects by overbars. The existence and uniqueness of 
strong solutions to Eq. (12. 3ft will be a consequence of Assumption (A[3} made 
in Section [3] see comments below Assumption (A[5]) there. 

It will be convenient to have a path space which is Polish for the compo- 
nents Ui, i £ {!,..., A^}, of a control process u 6 IAm- We choose the space 
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of deterministic relaxed controls on R dl x [0, T] with finite first moments. 
Let us first recall some facts about deterministic relaxed controls; see, for 



instance, |23l : Section 3.2] for the case of a compact space of control actions. 
Denote by 1Z the space of all deterministic relaxed controls on M. dl x [0, T], 
that is, TZ is the set of all positive measures r on B(R dl x [0,T]) such that 
r(M rfl x[0, t]) = t for all t G [0, T}. If r G TZ and B G then the mapping 

[0, T] 3 1 1— > r(Bx [0, t]) is absolutely continuous, hence differentiable almost 
everywhere. Since B(R dl ) is countably generated, the time derivative of r 
exists almost everywhere and is a measurable mapping rt : [0,T] — > V(R dl ) 
such that r(dyxdt) = r t (dy)dt. 

Denote by 1Z± the space of deterministic relaxed controls with finite first 
moments, that is, 

TZi = < r G 11 : \ \y\ r (dy xdt) < oo> . 

{ JM d ix[0,T] J 

By definition, 1Z± C 1Z. The topology of weak convergence of measures turns 
1Z into a Polish space (not compact in our case). We equip 1Z\ with the topol- 
ogy of weak convergence of measures plus convergence of first moments. This 
topology turns 1Z\ into a Polish space, cf. |28t Section 6.3]. It is related to 
the Monge-Kantorovich distances. For T = 1 (else one has to renormal- 
ize), the topology coincides with that induced by the Monge-Kantorovich 
distance with exponent one, also called the Kantorovich-Rubinstein distance 
or Wasserstein distance of order one. The topology is convenient because 
the controls appear in an unbounded (but affine) fashion in the dynamics. 
Thus ordinary weak convergence will not imply convergence of corresponding 
integrals, but convergence in 1Z\ will. 

Any R rfl -valued process v defined on some probability space (^l,J-, P) 
induces an 7£-valued random variable p according to 

(2.4) Pw {B x /) = J ' S v{ttU) (B)dt, BeB(l*),/c[0,T], w ea 

If v is such that \v(t,oj)\dt < oo for all ui G f2, then the induced random 
variable p takes values in TZ\. If v is progressively measurable with respect 
to a filtration (JF t ) in J 7 , then p is adapted in the sense that the mapping 
t ^ p(Bx[0,t}) is (J^)-adapted for all B G B{R dl ) @: Section 3.3]. 
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Given an adapted (in the above sense) T^i-valued random variable p and a 
Borel measurable mapping v : [0, T] — >• V(M. d ), we will consider the controlled 
SDE 

dX(t) = b{X(t)Mt))dt + ( [ a(X(t),v(t))yp t (dy)) dt 

(2.5) \JR d i J 

+ a(X(t),u(t))dW(t), X(0)~i/(0), 

where W is a c?i-dimensional (j^)-adapted standard Wiener process. Eq. (|2.5p 
is a parameterized version of Eq. ([2.70 below, the controlled analogue of the 
limit SDE ([2.20 . We will only have to deal with weak solutions of Eq. ([2.50 
or, equivalently, with certain probability measures on B(Z), where 

2 = # x ft! x W. 

For a typical element in Z let us write (</?, r, io) with the understanding that 

Notice that we include W as a component of our canonical space Z. 
This will allow identification of the joint distribution of the control and 
driving Wiener process. Indeed, if the triple (X,p, W) defined on some 
filtered probability space (Cl, J-, P, (J r t)) solves Eq. 02,50 for some measurable 
v: [0, T] — > V(W 1 ), then the distribution of (X,p, W) under P is an element 
oiV{Z). 

When Eq. 02.50 is used the mapping v : [0, T] — >• V(M. d ) appearing in 
the coefficients will be determined by a probability measure on B(Z). To be 
more precise, let G V(Z). Then induces a mapping v® : [0, T] — > V(R d ) 
which is defined by 

(2.6) u e (t)(B) = Q({(ip,r,w) € Z : <p(t) € B}), B € B(R d ), t G [0, T\. 

By construction, Vq(£) is the distribution under of the first component 
of the coordinate process on Z = X x 1Z\ x W at time t. Therefore, if 
corresponds to a weak solution of Eq. ([2.50 with v = i/q, then also 
corresponds to a weak solution of the controlled limit SDE 

(2.7) 

dX(t) = b(X(t),L&w(X(t)))dt+ ( [ a(X(t),Law(X(t)))yp t (dy)) dt 
+ a (X(t), Law (X(t)))dW{t), X(0) ~ u e (0). 
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Here W is a di-dimensional standard Wiener process defined on some prob- 
ability space (rijj 7 , P) carrying a filtration (T-t) and p is an (J 7 t)-adapted 
T^i-valued random variable such that (X, p, W) has distribution under 
P. The process triple (X , p,W) can be given explicitly as the coordinate 
process on the probability space (Z,B(Z),Q) endowed with the canonical 
filtration (Q t ) in B(Z). More precisely, the processes X, p, W are defined on 
(Z,B(Z)) by 

X(t, O, r, w)) = <f(t),p(t, ((p, r, w)) = r| B(R d! x[0)t]) , W(t, (<p, r, w)) = w(t). 

Here we abuse notation and use p(t, .) to denote the restriction of a measure 
defined on B(R dl x [0,T]) to ^(IR^ 1 x [0,i]). The canonical filtration is given 
by 

Q t = a ((X(s),p(s), W(s)) :0<s<t), t € [0, T}. 

Notice that p(s) takes values in the space of deterministic relaxed controls 
on M. dl x [0, s] with finite first moments. 

One of the assumptions we make below (Assumption (A|4j) in Section [3| is 
the weak uniqueness of solutions to Eq. (|2,7p . If ((f2,.F, P), {JFt), (X , p,W)) 
is a weak solution of Eq. (|2.7p then Po(X, p, P^) -1 G V{Z). The property of 
weak uniqueness can therefore be formulated in terms of probability measures 
on B(Z). 

Definition 1. Weak uniqueness is said to hold for Eq. (12. 7ft if whenever 
0,0 G V{Z) are such that 0, both correspond to weak solutions of 
Eq. (JZTTJ), i/ e (0) = f (O) and 0| B (7e lX W) = ®|B(7?.ixW)> then © = ©■ 

Thus, weak uniqueness for Eq. (|2.7p means that, given any initial dis- 
tribution for the state process, the joint distribution of control and driving 
Wiener process uniquely determines the distribution of the solution triple. 

3 Laplace principle 

A function /: V(X) —> [0, oo] is called a rate function if for each M < oo 
the set {9 G V{X) : 1(9) < M} is compact (some authors call such functions 
good rate functions). We say that a Laplace principle holds for the family 
{p N , N G N} with rate function / if for any bounded and continuous function 
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F : V{X) -> M, 

(3.1) lim -^logE \exp(-N ■ F(u N ))) = inf {F(0) + 1(6)} . 

n^od N 1 y eep(x) 

It is well known that in our setting the Laplace principle holds if and only 
if \n N , N G N} satisfies a large deviation principle with rate function / 



16c Section 1.2]. 

Let us make the following assumptions about the functions b, a and the 
family {x l,N } C M. d of initial conditions: 

(Al) For some v e V(R d ), ± £a 

=1 5 x i,N — > vq as N tends to infinity. 
(A2) The coefficients b, a are continuous. 

(A3) For all N GN, existence and uniqueness of solutions holds in the strong 
sense for the system of iV equations given by (12. ip . 



(A4) Weak uniqueness of solutions holds for Eq. (12. 7ft . 

(A5) If u N £ U N , N £ N, are such that 

N rT 



sup E 



i=l ,yu 



< OO, 



then {ju^, G N} is tight as a family of "P(Af)-valued random variables, 
where jx N is the empirical measure of the solution to the system of 
equations (|2.3p under u N . 



Assumption (AH]) is a sort of law of large numbers for the deterministic 
initial conditions. The assumption is necessary for the convergence of the 
empirical measures n" associated with the state process. The continuity 
Assumption (A|2j) implies that the coefficients b, a are uniformly continuous 
and uniformly bounded on sets B x P, where B C M. d is bounded and 
P C V{R d ) is compact. 

Assumption (A|3j) about strong existence and uniqueness of solutions for 
the prelimit model will be needed to justify a variational representation for 
the cumulant generating functionals appearing in (13. ip . see Eq. f)3.3|) be- 
low. Assumption (A[3j) and an application of Girsanov's theorem show that 
Eq. (12. 3p has a unique strong solution whenever J Q \u(t)\ 2 dt < M P- almost 
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surely for some M G (0, oo). In fact, there is a Borel measurable map- 
ping h N = (/if, . . . , h% ) with hf : Q, -»• X, i G {1, - - - , N}, such that, for 
P- almost all oj G ft, the unique strong solution of (|2,ip is given as 



and under the above integrability condition on u, the unique strong solution 
of (|2,3p equals P-almost surely 



By a localization argument one can now show that (12. 3ft in fact has a unique 
strong solution for all u G Un, which is once more given by the above relation. 

Weak uniqueness as stipulated in (A|4|) for the controlled nonlinear dif- 
fusions given by Eq. (|2.7p is meant in the sense of Definition [T] It is typical 
that such weak uniqueness holds if it holds for the uncontrolled system (12. 2 p . 

Grant Assumption (AQ]). Then Assumptions (A2)-(A5) are all satisfied 
if b, a are uniformly Lipschitz (with respect to the bounded Lipschitz metric 
on V(M. d )) or locally Lipschitz satisfying a suitable coercivity condition. A 
simple example of such a condition on b, a would be that for some constant 



The reason for Assumption (A|5]) being stated as it is, is that there are 
many different sets of conditions on the problem data (i.e., b and a) and 
the initial conditions which imply tightness of the empirical measures of the 
X l ' N . For instance, (A|5]) is automatically satisfied if the coefficients are 
bounded. It also holds if b, a are Lipschitz continuous. More general condi- 
tions can be formulated in terms of the action of the infinitesimal generator 
associated with Eq. (|2.7p , given in (|4.2p below, on some "Lyapunov function" 
ip : M. d — > R; also see Subsection 17.11 

For a probability measure 6 V{Z\ recalling that Z = X x 1Z\ x W, 
let Ox, ©7£ denote the first and second marginal, resp6ctively. Let T^oo be 
the set of all probability measures G V(2) such that 



X i ' N {.,u) = h?{W{.,u)) 




C > 0, all x G R d and all v G V(R d ), 

2(b(x, v),x)+ tr(aa T ) (x, v) < C (l + |x| 2 ) . 



(!) 
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(ii) G corresponds to a weak solution of Eq. (|2T 

(iii) ^e(O) = ^Oj where uq G V(J&. d ) is the initial distribution from Assump- 
tion (AH). 

The main result of this paper is the following. 

Theorem 3.1. Suppose that Assumptions (Al) - (A5) hold. Then the family 
of empirical measures {p N ,N 6 N} satisfies the Laplace principle with rate 
function 



1(9) = inf Iff \y\ 2 r(dyxdt)G n (dr). 
Ill JR d ix[0,T] 



eeVoo:O x =e 2 



Remark 3.2. The above expression for the rate function / is convenient 
for proving the Laplace principle. An alternative and perhaps more familiar 
form of the rate function is the following. By definition of Voo and since the 
control appears linearly in the limit dynamics, we can write 



1(9) = inf E e 



r 



\u(t)\ 2 dt 



where inf = oo by convention, u(t) = f Rdl yp t (dy), (X, W, p) is the canon- 
ical process on (Z,B(Z)), and B-almost surely X satisfies 

(3.2) dX(t) = b(X(t),9(t))dt + a(X(t),9(t))u(t)dt + a(X(t),9(t))dW(t). 

The proof of Theorem 13.11 is based on a representation for function- 
als of Brownian motion, a martingale characterization of weak solutions of 
Eq. (|2.7p . and weak convergence arguments. 

By Assumption (A|3|), for each N the A^-particle system of equations 
(|2.ip possesses a unique strong solution for the given initial condition. By 
Theorem 3.6 in for any F 6 Cb(X) the prelimit expressions in (|3.ip can 
be rewritten as 



^\ogB[exp{-N-F(p N ))] 



(3.3) 



inf 

u N eu N 




N j> 



+ E[F(^)] 



8=1 

where j2 N is the empirical measure of the solution to the system of equations 
(|2,3p under u N = (ui,...,u^) G lAjq. The representation in Q applies 
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to an infinite dimensional Brownian motion, and thus strictly speaking the 
infimum would be over a collection of controls indexed by i G N. However, 
since those controls with i > N have no effect on p, N we can and will assume 
they are zero. 

Based on Eq. (|3.3p . the Laplace principle will be established in two steps. 
First, in Section [5j we establish the variational lower bound by showing that 
for any sequence (u^jjva with u N GUn, 

r n t i ^ 

F E/ \uf(t)\ 2 dt +B[F(fi 



lim inf < — E 

AM-oc 2 



(3.4) 



A f x 



> inf 




\y\ 2 r(dyxdt)Qn(dr) + F(@ x ) 



In JR d i x[0,T] 

Second, in Section [6l we verify the variational upper bound by showing that 
for any measure G Voo there is a sequence {u n )n^ with u N G Un such 
that 

N ;/ 

V / 

N 



(3.5) 



lim sup | - E 



i=l JU 



+ E 



A^ 



1 

< - 

~ 2 




K Jrix[0,T] 



|y| 2 r( ( i 2 /xdt)e^(dr) + F(e^). 



To see that those two steps establish Theorem 13. 1[ first observe that 



inf I F(d) + 



eev(x) 



inf I - 

e£V x :0 x =e 2 



inf 




\y\ 2 r{dyxdt)Q n {dr) ) 

K JR d ix[0,T] J J 

y\ 2 r{dyxdt)Q n (dr) + F(e x ) 



in « J i x[o,t] 

Hence, in view of (|3.3p . we have to show that for all F G Cf,(X), 



inf j£(u) ^ 



ipf J*(e), 



where 



j£(u)--E 



1 * r T 

^E/ i«*(*)i 2 * 



+ E [F(p 



K Jrix[0,T] 



|y| 2 r(d ? /x ( it)e^(dr) + F(e^) 
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Let e > 0. For the lower bound, choose u N G N G N, such that 

J^(u N ) < inf u€ u N J^(u) +£. Then fj3.4[) implies that 

liminf inf jZ(u) > inf j£(9)-e. 

For the upper bound, choose a probability measure B G "Poo such that 

< infeePoo ^(©) + e - Since in f«GWjv ^jv( u ) ^ j n(v>) for an y G G ^n, 
( 13. 5P implies that 

limsup inf J$(u) < inf j£(G)+e. 

N^-oo u<=U N Be Poo 

Since e > is arbitrary, the assertion follows. 

There is a technical observation to be made about the probability spaces 
and nitrations underlying the stochastic control problems, namely that there 
is a certain flexibility in the choice of the the stochastic bases. This flex- 
ibility will be needed in establishing the variational upper bound. To be 
more precise we note that the representation theorem in Q holds for any 
stochastic basis rich enough to carry a sequence of independent standard 
(J-t)- Wiener processes. The filtration (J-t), which is assumed to satisfy the 
usual conditions, need not be the filtration induced by the Wiener processes, 
but may be strictly larger. As a consequence of Assumption (A[3|, the left- 
hand side of (|3.3p does not depend on the choice of the stochastic basis. 
The stochastic optimal control problem on the right-hand side of H3 . 3[) can 
therefore be regarded in the weak sense, i.e., the infimum is taken over all 



suitable stochastic bases; see Definition 4.2 in [34J: p. 64], The definition of 
the sets Un and Assumption (A|5j) are to be understood accordingly. 

As a consequence of the weak formulation of the control problems, in the 
proof of the variational lower bound, the control processes u , the driving 
Wiener processes W 1 , . . . ,W N and thus the empirical measures p, could 
live on stochastic bases which vary with N. While we do not make this 
variation explicit, it is easy to see that the arguments of Section [5] being weak 
convergence arguments, do not rely on having a common filtered probability 
space. The variational upper bound, on the other hand, will be established 
in Section [6] by taking an arbitrary G Too and then constructing a sequence 
of control processes and independent Wiener processes so that (13.5[) holds. 
The prelimit processes will be coordinate processes on a common stochastic 
basis which however will depend on the limit probability measure 0. 
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4 Auxiliary constructions 



This section collects useful results for characterizing those probability mea- 
sures in V(Z) which correspond to a weak solution of (|2,7p . Let G G V{Z). 
Recall from (|2,6p the definition of the mapping v® : [0,T] -> V(R d ) induced 
by 0. The mapping vq is continuous. To check this, take any io G [0, T] 
and any sequence (t n ) C [0, T] such that t n — > to- Then for all / G Cf,(R d ), 
the fact that elements of are continuous and the bounded convergence 
theorem imply 

f(x)v®(t n )(dx) = I f(ip(t n ))@(dipxdrxdw) 

f (<p(to))& (d<p xdrx dw) 

XxTZxW 



f(x)u @ (t )(dx). 

Therefore VQ>(t n ) — > i^e(t) in V(M. d ). The continuity of vq implies that the 
set {u e (t) : t G [0,T]} is compact in 

The question of whether a probability measure G V{Z) corresponds 
to a weak solution of Eq. (|2,7p or, equivalently, of Eq. (|2,5p with z/ = z/q can 
be conveniently phrased in terms of an associated local martin gale problem. 



We summarize here the main facts that we will use; see [32| . [23l : Sect. 4.4] 



and 



22 



Sect. 5.4], for instance. 
Given / G C 2 (M d x M. dl ), define a real-valued process (M®(i)) t6 [ 0) y] on 
the probability space (Z,B(Z),Q) by 

Mf(t,(<p,r,w)) = f[<p(t),w(t)) -/(¥>(0),0) 

<U) 1 Af(f)(<p(s),y,w(s))r s (dy)ds, 
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where for s G [0,T], x G y, z £ R dl , 




(f)(x,y,z) = (b(x,v e (s)) + a(x,u e (s))y,V x f(x,z 



)> 



(4.2) 




(z,z) 



; — 1 7 i ™ 1 




The expression involving Af(f) in (|4,ip is integrated against time and the 



time derivative measures r s of any relaxed control r. The measures r s are 
actually not needed in that we may use r(dyxds) in place of r s {dy)ds. 

The key relation, which we formulate as a lemma, is a one-to-one corre- 
spondence between weak solutions of Eq. (|2,7p and a local martingale prob- 
lem. 

Lemma 4.1. Let G P(Z) be such that Q({(tp,r,w) G Z : w{0) = 0}) = 1. 
Then corresponds to a weak solution of Eq. (12, 7ft «/ and on/y i/ M® is a 
local martingale under Q with respect to the canonical filtration (Qt) for all 
f G C 2 (R d x 

Moreover, in order to show that corresponds to a weak solution of 
Eq. (|2.7p . it is enough to check the local martingale property for those M® 
where the test function f is a monomial of first or second order, that is, for 
the test functions 

(x,z)^x k , k G {1, . . . ,d}, (x,z)^ Xj x k , j, k G {1, . . . , d}, 
(x,z)^z h I G {1, . . . ,di}, (x,z) H> ZjZi, j, I G {1, ... ,di}, 
(x,z) i->- XkZ h k G {1,. . . ,d}, I G {1, . . . , d\}. 

Proof. See for example the proof of Proposition 5.4.6 in j22,:p. 315]. Note 
that since the canonical process on the sample space (Z,B(Z)) includes a 
component which corresponds to the driving Wiener process, there is no need 
to extend the probability space (Z, B(Z), 0) even if the diffusion coefficient 
a is degenerate. □ 
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Remark 4.2. There is a technical point here concerning the canonical fil- 
tration (Qt) in B(Z). That filtration is not necessarily 0-complete or right- 
continuous, while in the literature solutions to SDEs are usually defined with 
respect to nitrations satisfying the usual conditions (i.e., containing all sets 
contained in a set of measure zero and being right-continuous). However, any 
stochastically continuous and uniformly bounded real-valued process defined 
on some probability space (f^T 7 , P) which is a martingale under P with re- 
spect to some filtration (J^t), is also a martingale under P with respect to 
{fpA, where (J-f) denotes the P-augmentation of (J-t)\ see the solution to 
Exercise 5.4.13 in 22|:p. 392]. The filtration (Jt+) satisfies the usual condi- 
tions. Since the localizing sequence of stopping times for a local martingale 
can always be chosen in such a way that the corresponding stopped processes 
are bounded martingales it follows that if is a local martingale under O 
with respect to (Qt) then it is also a local martingale under with respect 
to (Qf + )- The local martingale property of the processes M® under B with 
respect to the canonical filtration (Q t ) thus implies that the canonical process 
on (Z,B(Z)) solves Eq. (12. 7ft under with respect to the filtration (Gt+), 
which satisfies the usual conditions. 

Remark 4.3. The reason why we use a local martingale problem rather 
than the corresponding martingale problem is that it gives more flexibility 
in characterizing the convergence of ltd processes which are not necessarily 
of diffusion type. In Subsection 17. 2[ we extend the Laplace principle of 
Theorem 13.11 to interacting systems described by SDEs with delay. In that 
case, the coefficients b, a are progressive functionals; thus, they may depend 
on the entire trajectory of the solution process up to the current time. An 
appropriate choice of the stopping times in the local martingale problem 
gives control over the state process up to the current time and not only 
at the current time. In particular, the proof of Lemma 15.21 below, where 
the local martingale problem is used to identify certain limit distributions, 
continues to work also for the more general model of Subsection 17.21 
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5 Variational lower bound 



In the proof of the lower bound (|3.4p we can assume that 



(5.1) 



E 



1 N 

1=1 



T 



\t)\ 2 dt 



< 2\\F\\ 



since otherwise the desired inequality is automatic. Let [u )n& be a se- 
quence of control processes such that (|5.ip holds. This implies in particular 
that for P-almost all u E fi, all N E N, % E {1, . . . , N}, J Q T \uf (t, u>)\dt < oo. 
Modifying the sequence (u ) on a set of P- measure zero has no impact on 
the validity of (|3.4p . Thus, we may assume that uf(.,u) has a finite first 
moment for all co E CI. 

For each NgN, define a V(Z)-vahied random variable by 



(5.2) Q^(BxRxD) 



1 N 



(D), 



BxRxD E B(Z), well, where X l ' N is the solution of Eq. (J23J| under u 



,N 



« 



A? 



and p^ is the relaxed control induced by u 1 - 



,0J 



according 



to (12. 4ft , Notice that p^ 6 7Z\. The functional occupation measures Q N , 
N E N, just defined are related to the Laplace principle by the fact that 



(5.3) 



1e 

2 



i=l 1/0 




i x [0,T] 



+ E [F(p»)] 
y\ 2 r(dyxdt)) Q^(dr) + F(Q» X ) 



P(dw), 



where x> Q^Tl denote the first and second marginal of E V(Z), 
respectively, and we recall that Z = X x 1Z\ x W. 

Thanks to Assumption (A|5|) and the bound (|5.ip . the first marginals 
of (Q n )ngN are tight as random measures. The next lemma states that 
tightness of (Q N )NeN as random measures follows. Thus we are asserting 
tightness of the measures 7^ E V(V(Z)) defined by -y N (A) = P(Q N E A), 
A E B(V(Z)). 



Lemma 5.1. The family (Q N )n&n of V(Z) -valued random variables is tight. 
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Proof. The first marginals of (Q N )n£N are tight by Assumption (A[5]) and 
(|5.ip . Since the third marginals are obviously tight, we need only prove 
tightness of the second marginals. Observe that 

g(r) = / \y\ 2 r(dyxdt) 

JR d i x [0,T] 

is a tightness function on TZ\, i.e., it is bounded from below and has compact 
level sets. To verify the last property take c G (0, oo) and let R c = {r G 7Z\ : 
g(r) < c}. By Chebychev's inequality, for all M > 0, 

(*) sup r({y G M dl : \y\ > M} x [0, T}) < -JL 

Hence i? c is tight and thus relatively compact as a subset of 7Z. Consequently, 
any sequence in R c has a weakly convergent subsequence with limit in 1Z. 
Let (r n ) C R c be such that (r n ) converges weakly to r* for some r* G 7£. It 
remains to show that r* has finite first moment and that the first moments of 
(r n ) converge to that of r*. By Holder's inequality and a version of Fatou's 
lemma (cf. Theorem A. 3. 12 in |l6t p. 307]). 

VT-c>^ in{ [ |,|r.(*x*)>/' |,|r.(*x*). 

n ^°° 7R d ix[0,T] JR d ix[0,T] 
Let M > 0. By and Holder's inequality we have for all r G i? c , 



/ |y| r(dyxdt) < 

hu& d ^:\v\>M\x\Q,T\ M 



!{y&SL d i:\y\>M}x{0,T] 

Therefore, using weak convergence 



limsup / \y\ r n (dy x dt) < + / |y| r*(dyx<ft) 

n->oo jR d ix[0,T] iW J{?;GR d i:|y|<M}x[0,T] 

< Y7 + / |y|r,(dyx(&). 

JW jR d i x fo.n 



d i x [0,T] 

Since M > may be arbitrarily big, it follows that 



lim / \y\r n (dyxdt) = I \y\ r*(dy x dt) 



n^rco ^di x [ 0)T ] J R d x x [q )T j 



We conclude that g is a tightness function on TZ\. Now define a function 
G: V(Z) -> [0,oo] by 

G?(0)= / g(r)9(d(pxdrxdw). 
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Then G is a tightness function on second marginals in V"(Z), see Theo- 
rem A. 3. 17 in 16t p. 309], Thus in order to prove tightness of the second 
marginals of (Q n )ngN ( as random measures) it is enough to show that 



sup E \G(Q N )} < oo. 

Nen 

However, this follows directly from (|5.ip . □ 

In the next lemma we identify the limit points of (Q N ) as being weak 
solutions of Eq. (|2.7p with probability one. The proof is similar in spirit to 
that of Theorem 5.3.1 in [23: p. 102]. 

Lemma 5.2. Let (Q Nj )j£N be a weakly convergent subsequence of {Q N )n&$- 
Let Q be a T > (Z)-valued random variable defined on some probability space 
[pl,J-, P) such that Q N i Q in distribution. Then corresponds to a 
weak solution of Eq. (|2,7p for P-almost all u) G Cl. 

Proof. Set 1= {Nj,j G N} and write (Q n ) n &i for (Q Ni )jeS- By hypothesis, 
Q n — )■ Q in distribution. 

Recall from Lemma [4. II in Section [4] that a probability measure G V{Z) 
with 0({((p, r, w) G Z : w(0) = 0}) = 1 corresponds to a weak solution of 
Eq. fllTT} if (and only if), for all / G C 2 (R d xR*), Mf is a local martingale 
under G with respect to the canonical filtration (Gt)j where M® is defined 
by (14. ip . Moreover, the local martingale property has to be checked only for 
those where the test function / is a monomial of first or second order. 

In verifying the local martingale property of when G = Q u for some 
U) G fi, we will work with randomized stopping times. Those stopping times 
live on an extension (Z,B(Z)) of the measurable space (Z,B(Z)) and are 
adapted to a filtration (Qt) in B(Z), where 

Z = Z x[0,l], Q t = Q t x B([0, 1]), iG[0,T], 

and (Qt) is the canonical filtration in B(Z). Any random object defined 
on (Z,B(Z)) also lives on (Z,B(Z)), and no notational distinction will be 
made. 

Let A denote the uniform distribution on B([0, 1]). Any probability mea- 
sure Q on B(Z) induces a probability measure on B(Z) given by = x A. 
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For each k G N, define a stopping time Tfc on (Z,B(Z)) with respect to the 
filtration (Qt) by setting, for (z, a) G Z x [0, 1], 

7fc(z, a) = inf {t G [0, T] : i) > k + a} , 

where 

v((ip,r,w),t) = / |y|r(dyxds)+ sup \<p(s)\ + sup |to(s)|. 

JiR d ix[o,t] se[o,t] se[o,i] 

Note that the mapping t i— )• v((ip,r,w),t) is monotonic for all ((p,r,w) G i?. 
Hence the stopping times have the following properties. The boundedness 
of (p and w (being continuous functions on a compact interval) and the 
boundedness of J^d lx ^ T ^\y\r(dyxds) imply that rj. / T as oo with 

probability one under 0. The second property of note is that the mapping 

Z x [0, 1] 3 (z, a) ^ r k (z, a) G [0, T] 

is continuous with probability one under 0. To see this, note that for every 
z G Z the set 

A z = {c G M+ : f (z, s) = c for all s G [t, t+6], some i G [0, T], some 5 > 0} 

is at most countable. However, z \— > t^[z) fails to be continuous at (z,a) 
only when k + a G A z . Therefore, by Fubini's theorem, 

0({(z,a) £ Z : Tk discontinuous at (z,a)}) = / 1a z (k+a)Q(dz x da) 

Jz 

= 11 l Az (k+a)X(da)Q(dz) 

JZ J [0,1] 

= 0. 

Notice that if M® is a local martingale with respect to (Qt) under = 
G x A with localizing sequence of stopping times (r^keN, then M® is also 
a local martingale with respect to (Qt) under with localizing sequence of 
stopping times 0))fc g pj; see Appendix lA.il Thus it suffices to prove the 

martingale property of M® up till time with respect to filtration (Qt) and 
probability measure 0. 

Clearly, the process M®(. A Tfe) is a (^)-martingale under if and only 

if 

(5.4) E© xA [* • (Mf (ii A r fc ) - Mf (to A r fe ))] = 
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for all to,tx G [0, T] with to < ti , and C/£ -measurable $ G Cfe(iJ). 

To verify the martingale property of M®(. Art) it is enough to check 
that (|5.4p holds for any countable collection of times to, t± which is dense in 
[0, T] and any countable collection of functions $ G Cb(Z) that generates the 
(countably many) <r-algebras Qt ■ Recall that the collection of test functions 
/ for which a martingale property must be verified consists of just monomials 
of degree one or two, and hence is finite. Thus, there is a countable collection 
T C N x [0, T} 2 x C b (Z) x C 2 (M d xM dl ) of test parameters such that if (jO]) 
holds for all (k, to, t\, \P, /) G T, then corresponds to a weak solution of 
Eq. d2T|. 

Let (Mo,il,*,/) 6 T. Define a mapping $ = $( k ,t ,ti,y,f) h Y 

V (Z) 98^ $(9) = E 6 xA [* • {Mf(h A r k ) - Mf(t A r fc ))] . 

We claim that the mapping <3? is continuous in the topology of weak conver- 
gence on V{Z). To check this, take G V{Z) and any sequence (0/)^^ C 
V{Z) that converges to 0. Recall the definitions (14. ip and (14. 2ft , As a conse- 
quence of Assumption (AEJ) and by construction of the stopping time T k , the 
integrand in (|5.4p is bounded; thanks to Assumption (AG|) and the almost 
sure continuity of T k , it is continuous with probability one under = x A. 
By weak convergence and the mapping theorem 0:p- 21], it follows that 



E 0iXA [* • [Mf{t x A r k ) - Aff (to A Tfc))] 

E 6X A [* • (Mf (t! A r fe ) - Mf (t A r fe ))] 



(5.5) 



Since the sequence (0;) converges to 0, the set {0; : / G N} U {0} is com- 
pact in V(Z). Recalling (|2.6p . we find that the set of probability measures 
{mt(t) : / G N, t G [O,T]}U{^ (t) :tG [0, T}} has compact closure in V(R d ). 
We claim that together with Assumption (A|2]) and the construction of T k , 
this implies that 

sup \Mf l (tAT k (z),z)-Mf(tAT k (z),z)\ ^ 0. 
te[o,T],zez 

To see this, we consider for example the integral corresponding to the first 
term in the drift, which is 

tf\T k {z) 

(b((p(s), v @l (s)),V x f(ip(s),w(s))) ds. 

o 



22 



By the assumed continuity properties of b this converges uniformly in t £ 
[0,T],i G i to 



Jo 



tAT k (z) 



(b(tp(s), ve(s)),V x f(ip(s),w(s))) ds, 



and a similar result holds for each of the other terms. Since ^ is bounded, 
it follows that 



E 0iX A [* • [Mf(h A T k ) - Mf(to A T k ))] 



E 



e ; xA 



^■(Mf l {hAr k )-Mp (t Ar k )) 



l— >oo 



0. 



In combination with (|5.5p this implies $(0;) — > 3>(0). 

By hypothesis, the sequence (Q n ) n ^j of T^iJj-valued random variables 
converges to Q in distribution. Hence the mapping theorem and the conti- 
nuity of $ imply that $(Q n ) — > 3>(Q) in distribution. 

Let n E I. By construction of Q n and Fubini's theorem, for uj € CI, 

*(Q2) = E QSxA [* • {Mf(h A T k ) - Mf«(t A Tfc)) 

•(/(X i ' n (t 1 Aff )W ),r(t 1 Afj n )W )) 
- /(X^(t A 7f», W*(t A rf\u;)) 



X?S(/)(^' n ( a ,a;),«?( a ,w),W < ( fl ,a;))(fa)do, 



where .4/^ i s defined according to (|4,2p with /2™ in place of z^e, and r^' n = 
f^' n (a;,a) is defined like T k (((p,r,w),a) with y> replaced by X l ' n (.,cv), r re- 
placed by plf 1 , the relaxed control corresponding to uf(.,oj), and w replaced 
by W%,w). 

For all a G [0, 1], by Ito's formula, it holds P-almost surely that 
fiX^ihAf^W'ihAf^)) - fiX^itoAT^W'itoAf^)) 



*iAt, 



.4f (/)(X i ,«( a ) jU? ( a ) )W <( a ))d a 
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f tlAT k 

/ V x f T (X i ' n (s),W i (s))a{X i ' n (s),fi n (s))dW i (s) 

, . -i.n 

+ / V-J J {X^{s),W\s))dW\s), 



where T^ n = T^ n (.,a) and T l k ' n , fj, n , X l,n , uf, are random objects on (SI, J 1 "). 
By Fubini's theorem and Jensen's inequality, we have 



E [$(Q 



n\21 



< / E 

JO 



E, 



*(., a) • (M^" (h A r fc (., a)) - M?" (t„ A r fe (., a)))] ' 



da. 



For all a 6 [0,1], by the ltd isometry and because Vl/(.,a) is ^-measurable 
and Tfc(., a) is a stopping time with respect to (Gt), it holds that 



E 



E 



QZ 



*(.,a)- (M^(tiA75fc(.,a)) -M^^Ar^a))) 



E 



E 



= E 



*(-> a ) -1 fa(.,a)>to} 

• (M^" (tx A r fe (, a)) - M^" (t A r fc (, a 
(-Y V(.,a)-lj-i,n, , ■ (V z f T (X^ n (s),W t (s)) 

^ n h'^r(,a) K G ' a) -° } ^ 

+V x f J {X^{s),W\s))c{X^{s),n n {s)))dW\s)) 



rv 



i=l 



0. 



lt Afl'"(.,a) 



+V x f T {X i > n (s),W i (s))a{X i > n (s),fi n (s))) 



ds 



It follows that for each (k, to, ii, f,/)eT there is a set Z^ tQtl ^,^ G J 7 
such that P^fcjto,*!,*,/)) = and 

*(fc,to,ti,*,/)( < ?«) = for all u G O \ Z (ktto>tu y >f) . 
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Let Z be the union of all sets Zn,^ tx ^/j\^ (k, to,t%,^, f) G T. Since T is 
countable, we have Z G J 7 , P(Z) = and 

*(fc,to,ti,*,/)(Q«) = for all u> G O \ 2, (fe, t , *!,*,/) € T. 

It follows that Qoj corresponds to a weak solution of Eq. (12. T|) for P-almost 
all weO, □ 

The function i 7 in (|3.4p is bounded and continuous. The variational lower 
bound now follows from Eq. (|5.3jl . Lemmat a l5.1l and l5.21 Fatou's lemma and 
the definition of I. 



6 Variational upper bound 

Let 6 Voo- We will construct a sequence (u^jveN with u N G Un on a 
common stochastic basis such that H 3 . 5 [) holds: 

+ E [F(^)] | 
<l! I \y\ 2 r{dyxdt)G n (dr)+F(Q x ). 

1 Jit JR d i x [0,T] 

Let (X , p, W) be the canonical process on Z (cf. end of Section [2]). Then 
((Z, B{Z), 6), (<3f + ), (X, p, W)) is a weak solution of Eq. $21) . The filtration 
satisfies the usual conditions, where {Qf) denotes the O-augmentation 
of the canonical filtration (Qt) (cf. Section H]). 

Since the relaxed control process p appears linearly in Eq. (|2,7p . it cor- 
responds, as far as the dynamics are concerned, to an ordinary ((^-adapted 
process u, namely 

u(t,u)= ypw,t(dy)i te[0,T], co eZ, 

JR d l 

where p u j is the derivative measure of p u at time t. For the associated costs, 



lim sup < - E 

iV->oo 2 



1 N 

-Y 

i=l 



u?(t)\ z dt 
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by Jensen's inequality, 













2" 


E 


f \u{t)\ 2 dt 
Jo 


= E 


/ 

J 


/ VPt{dy) 

JR d i 





< E 



E 



'ix[0,T] 



\y\ 2 Pt(dy) 
\y\ 2 p(dyxdt) 



whence u performs at least as well as p. Let p be the relaxed control random 
variable corresponding to u according to f|2.4[) . In general, p 7^ p. However, 
since both (X, p, W) and (X, p, W) are solutions of Eq. (|2.7p under G and 
since the costs associated with u and thus p never exceed the costs associated 
with p, we may and will assume that p = p. 

Define a probability space (Oqo, J 700 , Poo) together with a filtration (J-£°) 
as the countably infinite product of (Z,B(Z), O) and (Gt+), respectively. For 
a typical element of f^oo let us write u = (u}\,u)2, . . .). For i S N define 

W l '°°(t,uj) = W(t,u)i), uf{t,Ui) = u(t,Ui), weOoo.te [0,T]. 

Let p 1 ' 00 be the relaxed control random variable corresponding to uf. By 
construction, (p 1 ' 00 , W i,0 °), i € N, are independent and identically distributed 
with common distribution the same as that of (p,W). In particular, W l '°°, 
i£N, are independent <ii-dimensional standard Wiener processes. 

For N e N, let X l > N , . . . , be the solution to the system of SDEs 

dX l ' N (t) = b(X' l ' N (t) , p N (t)) dt + a {X^ N (t) , p N (t)) uf (t)dt 

+ a(X i > N (t),p N (t))dW i > co {t), X^ N {0) = x i > N , 

where p N (t) is the empirical measure of X 1 '", . . .,X N ' N at time t. Thus, 
X^ N solves Eq. (jOJ) with the same deterministic initial condition as before, 
but on a different stochastic basis. 

For each N G N define, in analogy with (|5.2p . a V{Z\ valued random 
variable according to 



Q^(BxRxD) 



1 N 

= -Y$ 



i=l 



2G 



B x R x D G B(Z), uj G r^oo. In analogy with fj5.3[) we have 




<°(*)| 2 ^ 



+ EOO [F(/2 W )] 



d i x [0,T] 



|y|V(dyxdt) Q~n(dr) + F(Q. 



Poo(dw) 



Since (p 1 ' 00 , W l '°°), i G N, are i.i.d., the second and third component of 
(Q N )ngN are tight. Tightness of the first component is an immediate con- 
sequence of Assumption (A|5j). Thus, (Q N )NeN is tight as a family of V(Z)- 
valued random variables. 

Let Q be any limit point of (Q n )ngN defined on some probability space 
P). By Lemma 15.21 and its proof, it follows that, for P- almost all 
U) £ il, Q u corresponds to a weak solution of Eq. (|2.7p . Moreover, since 
(p l '°°, W 1 ' 00 ), i £ N, are i.i.d. with common distribution (under Poo), the 
same as that of (p, W) (under 0), Varadarajan's theorem [15c p. 399] implies 
that, for P-almost all uj G f2, 

QuWKi x w) = ° (fit wy 1 , 

that is, the joint distribution of the second and third component of the 
canonical process on Z under a typical equals the joint distribution of 
the control and Wiener process with which we started. 

By Assumption (AS|, weak sense uniqueness holds for Eq. (|2.7p . There- 
fore, for P-almost all uj G O, 

q u = eo(x, P ,w)-\ 

In view of Eq. (|6.ip . the above identification of the limit points establishes 
(|3.5p . the variational upper bound. 



7 Remarks and extensions 

A feature of the weak convergence approach to large deviations is its flexi- 
bility. To illustrate this point we show in Subsection 17.21 how to extend the 
Laplace principle established in Theorem 13.11 to weakly interacting systems 
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described by stochastic delay (or functional) differential equations. Before, 
in Subsection 17. 1| we compare our result to the classical large deviation 
principle (LDP) established in flo| . 

7.1 Comparison with existing results 

In this subsection we compare our results with the now classical work |10| . 
One of the main assumptions in the latter work is the non-degeneracy of the 
diffusion coefficient a. Although the expression for the rate function is well- 
defined even if the diffusion matrix aa T is not invertible, the assumption of 
non-degeneracy is important in the proof of the LDP. Additionally, weak 
interaction is allowed only through the drift term. Proofs proceed by first 
establishing a local version of the LDP which is then lifted to a global result 
using careful exponential probability estimates. 

The approach taken in the current paper does not require any exponential 
estimates and proofs cover the setting of a degenerate a and models with 
weak interactions in both the drift and diffusion coefficient. The significant 



additional assumption made in the current work over 10j is (A|3]) — we require 
strong existence and uniqueness of solutions to Eq. (12, ip whereas the cited 
paper only assumes weak existence and uniqueness. 

Of somewhat lesser significance is the difference in the topology consid- 
ered on V(M. d ) and the space over which the LDP is formulated. In par- 
ticular, in [lOj the drift coefficient b need not be continuous on the entire 
product space R rf x V(W i ), where V(M. d ) is equipped with the topology of 
weak convergence, but only on R rf x .Moo, where .Moo is a set of probabil- 
ity measures on B(M. d ) which satisfy certain moment bounds in terms of a 
"Lyapunov function" tp : R d — > R. The set Aioo is equipped with the "in- 



ductive" topology induced by (p 10: Section 5.1]. Additional assumptions 
in terms of this Lyapunov function are imposed which in particular ensure 
that (/j, (i))o<t<T is a .Moo-valued process with continuous sample paths 
(see (B.2)-(B.4) in : Section 5.1]). With some additional work, we can 
relax Assumption (AEJ) on the continuity of b, a in their second argument 
and, under Lyapunov function conditions an alog ous to (B.2)-(B.4), obtain 



an LDP in a space similar to the one used by 10j, namely C([0, T], .Moo)- A 
minor difficulty, with the approach taken here, in working with .Moo is that 
the inductive topology is not metrizable. However, one can proceed as fol- 
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lows. Let V\(M. d ) be the set of all probability measures v G V(M. d ) such that 
J \(x)v(dx) < oo, where A(x) = |x|fco(|x|, \x\) for some (suitable) symmet- 



ric, continuous, non-negative and non-decreasing function fco [cf. |28|:p. 123]. 
The topology of A- weak convergence, i.e., weak convergence plus convergence 
of A-moments, makes V\(M. d ) a Polish space; cf. Theorems 6.3.1 and 6.3.3 in 



28|:pp. 130-134]. Instead of (A|2|), we would assume that b, a are continuous 
as functions defined on M. d x V\(M. d ) with V\(M. d ) carrying the topology of 
A-weak conver genc e. The function A plays the role of the Lyapunov func- 
tion ip used in (lOt Section 5.1]. The only further modification would regard 
Assumption ( A|5j) . In addition to tightness of the sequences of empirical 
measures (/2 ), one would have to guarantee that the time marginals fi N (t) 
stay in V\(M. d ). An appropriate condition (which would be analogous to 
conditions (B.2)-(B.4) in [10: Section 5.1]) could be formulated in terms of 
the Lyapunov function. 



The expression for the rate function as given in Eq. (1.5) in [10( is different 
from the form given in Theorem 13. II of the current paper. For simplicity we 
consider the case where a is the identity matrix. The rate function (called 
"action functional" in 



10[|) <S is given by 



(7.i) s(e(.)) = l -[ T sup \m-c(9(t)yo{t)J)? dt 

<*(*), |V/| 2 > 

if 9(.) : [0,T] —> Mqo is absolutely continuous and S(9(.)) = oo otherwise. 
Here T> is the Schwartz space of test functions M. d — > R with continuous 
derivatives of all orders and compact support and C(0(t))* is the formal 
adjoint of the generator C(9(t)), which operates on / S T> according to 

1 d fff 

c(e(t))(f)(x) = (b(x,e{t)),vf(x)) + -J2 q^-W- 

j,k=l ^ 



Probability measures on £>(R ) are interpreted as elements of T>', the Schwartz 
space of distributions consisting of all continuous linear functionals on T>. 
Absolute continuity of 6(.) and the time derivatives 9(t) are defined accord- 
ingly. With an abuse of notation, for tp € D' and / E T>, ip(f) is written as 
(ijj,f). The operator C(9(t))* maps elements of T>' to T>'. 

As mentioned in Remark 13.21 for the special case where a is the identity 
matrix the family {fi N (.),N G N} satisfies a large deviation principle with 
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rate function 
(7.2) 



lie) = inf E e 



\u{t)\ 2 dt 



where inf0 = oo by convention, u(t) = J" Rdl ypt(dy), (X ,W, p) is the canon- 
ical process on (Z,B(Z)) and 0-almost surely X satisfies, 

dX(t) = b(X(t),9(t))dt + u(t)dt + dW(t). 

In order to see the relation between the rate functions (|7.ip and (|7.2p we 
proceed, somewhat formally, as follows. Let G Voo be such that for some 
measurable function v. [0,T] x R d ^ R dl with v(t, .) G V for all t G [0,T], 
fwii UPt(dy) = Vu(t, X(t)), 0-almost surely. We denote the collection of all 
such by Vlo- Let 9 G V(X) be such that, for some G 7^, 9 = ®x- Fix 
such a 0. Under 0, the first component X of the canonical process solves 

dX(t) = b(X{t),9(t))dt + Vv(t,X(t))dt + dW(t). 

For f £ T>, applying Ito's formula to X, we get 



f(X(t+h))-f(X(t)) 

A+ c{e(s))U){x(s))ds + 



t+h 



Vf{X(s))-Vv(s,X(s))ds 

+ M(t+h)-M(t), 

where M is a (^j)-martingale under 0. Taking expectations in the above 
display, dividing by h and sending h — > 0, we obtain 



(9(t) - C(9(t)y9(t),f) = {9(t), V/ • Vv(t, .)) , t G [0, T]. 



Then 



sup 



\(9(t)-£(9(t))*9(t),f)\< 



/ex>:{e(t),|v/| 2 )^o 



<*(*), |V/| S 



sup 



| (#(*), v/-v<i,.)) I 



/e£>:(0(t),|V/| 2 }^O |W| 2 ) 

= <0(t),|Vt;(U| 2 > 
= E 8 [Kt)| 2 ]. 

Since the above relation holds for every G "P,^, satisfying 9 = Q X , we get 



S{9(.)) 



inf Ee 



oo 
T 



\u(t)\ 2 dt 
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A formal relation between the rate functions (|7.ip and (|7.2p is now appar- 
ent. Making this connection more precise requires some work. In particular 
one needs to argue that the infimum of the cost in the rate function (|7.2p 
can be restricted to Markov controls of the form u(t) = X7v(t,X(t)). 

7.2 Processes with delay 

Our approach allows one to treat more general ltd equations than those of 
diffusion type with very little additional effort. A good example are SDEs 
whose coefficients are allowed to depend on the entire past of the state tra- 
jectories. Let us make this more precise. Suppose that the coefficients b, 
a are progressive functionals defined on [0, T] x X x V(M. d ), where we re- 
call that X = C([0,T],lR d ); that is, b, a are Borel measurable and for each 
t G [0, T], b, a restricted to [0, t] x X x V(M. d ) is measurable with respect 
to B([0,t]) x Of x B(V(R d )) where Q% is the cr-algebra generated by the 
coordinate process on X. Eq. (|2.ip . the prelimit equation for an individual 
particle (the i-th out of N), takes the form 



(7.3) dX^ N {t) = b(t,X i > N , f i N {t))dt + a(t,X t ' N ,fi N (t))dW i (t), 



The system of N equations given by (|7.3p is a system of stochastic functional 
differential equations or stochastic delay differential equations (SFDEs or 
SDDEs). The corresponding uncontrolled limit equation reads 

(7.4) dX(t) = b(t, X, Law (X(t)))dt + a (t,X, Law (X(t)))dW(t), 

while the controlled versions of (|7.3p and (|7.4p will be 



respectively. In Eq. f j T . 5 [) Ui is the i-th component of u = (u±, . . . ,ujsr) for 
some u E Un, while p in Eq. (|7.6p is an adapted T^i-valued random variable 
as in Eq. (1277]) . 



(7.5) 



b (t, X^ N ,fl N (t)) dt + a (t, X^ N ,j2 N (t)) Ui (t)dt 
+ a{t,X i ' N ,fi N (t))u i (t)dW i (t), 



(7.6) 




+ a (t, X, Law(X(t))) u{t)dW{t) 
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The Laplace principle can now be established in the same way as above 
except for two points which need modification. Those are the formulation 
of the local martingale problem in Section [4] and the continuity assumption 
(A[2|). Let us denote by (A3')-(A5') the analogues of Assumptions (A3) - 
(A5), which are obtained by replacing all references to Equations (|2,ip . (|2.2p . 
(1231) . flg7TD with Equations ([73]) . (jT^l . f[73]) . (17^1) . respectively. 

As to the martingale problem, we have to redefine the processes Mf and 
the "generators" Af(f) according to 

M?(t,(<p,r,w)) = /(*>(*),«>(*)) -/fa(0),0) 

- I I Af(f)(<p,y,w(s))r s (dy)ds, 
Jo Jm. d i 

where for s G [0, T], ip € X, y,z £ R dl , 

■Af(f)(ip,y,z) = (b(s,<p,ue(s)) +a(s,<p,ve(s))y,V SE f(<p(s),z)) 
1 d d 2 f 
j,k=i 3 k 

+ 2g^ ( ^ z) 

fc=l z=l fe ' 

Notice that the test functions / are still elements of C 2 (R d xR dl ). With 
these redefinitions, Lemma |4 . 1 1 continues to hold. 

Assumption (A|2j) about the continuity of b, a has to be modified in 
order to account for the time dependence and supplemented by a condition 
of uniform continuity and boundedness, which is automatically satisfied in 
the diffusion case. 

(.AO ) The functions b(t, .,.), cr(t,.,.) are continuous, and uniformly contin- 
uous and bounded on sets B x P whenever B C X is bounded and 
P C V(M. d ) is compact, uniformly in t G [0, T]. 

Define the set of probability measures on B(Z) as the set "Poo in 
Section [3j replacing reference to Eq. (|2,7p with Eq. (|7.6p . Then the following 
large deviation (or Laplace) principle holds. 
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Theorem 7.1. Grant Assumptions (Al), (A2') - (A5 ! ). Then the family of 
empirical measures {{i N ,N £ N} associated with Equations (|7.3p satisfies 
the Laplace principle with rate function 



1(9) 




y\ 2 r(dyxdt)@Ti(dr). 



Note that there is also a simpler looking form of the rate function as in 
Remark 13.21 The proof of Theorem 17.11 is completely analogous to that of 
Theorem 13.11 given in Sections [5]and[6] The proof of Lemma \5. 21 in particular, 
and specifically the use of the local martingale problem and randomized 
stopping times there was tailored to fit not only the diffusion case, but the 
case of dynamics with delay as well. 

Lastly, note that we could further generalize our model to include the case 
of coefficients b, a which also depend on the past of the empirical process. In 
this case, b, a would be progressive functionals defined on [0, T] x X x V(X), 
and a Laplace principle could be established in the same way as before. 
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A Appendix 

A.l Local martingales with respect to (Q t ) and (Q t ) 

Let the notation be that of the proof of Lemma 15.21 in Section [5] Let B G 
V{Z), f G C 2 (R d ), and set M(t) = Mf(t), t G [0,T]. Notice that M is a 
random object defined on (Z,B(Z)) with values in X = C([0, T], M. d ), which 
can be identified with the random object living on (Z,B(Z)) given by 

Z x [0, 1] B (z, s) ^ (M(t, z)) te[0)T ] G X. 

Let k G N. Suppose that M(. A T&) is a martingale under 8 = 6 x A with 
respect to the canonical filtration (Qt) in Set 

TfcO) = T k (z,0), z€Z. 

We claim that M(.Ar?) is a martingale under O with respect to the canonical 
filtration (Qt) hi 

Proof. Since is a (^)-stopping time and Q t = Q t x B([0, 1]), t G [0, T], it 
follows that r£ is a (^)-stopping time. Moreover, r£ is also a (^)-stopping 
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time, because Qt can be identified with Q t x {0, [0, 1]}, t G [0, T], and (Qt x 
{0, [0, 1]}) is a subfiltration of (Q t ). 

Let s, t G [0, T], s < i. We have to show that 

E© [M(t A T° k ) ■ l z ] = E© [M(s A r fc °) • l z ] for all Z G 

Since M(. AtJ is a martingale under with respect to (Qt) an d T k is also 
a (^t)-stopping time, it follows that M(. Ar k A T k ) is a martingale under 
with respect to (Qt)- Yet for all (z,t) G 2, 

(r k Ar£)(z,t) = r fc (2,t)Ar t (2,0) = r fc (z,0) = r fc °(z) 

by construction of and definition of t£. Hence we know that 

Eg [M(t A r fc °) ■ \z\ = Eg [M( S A r° k ) ■ If,] for all Z G 

Let 2e5 s . Then Z x [0, 1] G Q s and, by Fubini's theorem, 

E© [M(t A r fc °) • l z ] = £ M(t A rj?(z)) • l z (s) 0(cte) 

= / / M(t Ar° k (z) ■ l Zxm (z,a)e(dz)\(da) 
J [0,1] Jz 

= I M(t At^(z) ■ l Zxm (z,a)Q(dzxda) 

J2x[0,l] 
= Eq [M(tAr fe o )-l Zx[0il] ] 

= E § [M( S AT fc °)-l Zx[0)1] ] 

= / M(s At^(z) ■ l Zxm (z,a)e(dzxda) 
JZx[0,l] 

= E© [M(sAT° k )-l z ]. 

□ 



37 



